[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax #5724

t4c1 · 2022-03-03T15:17:42Z

For functions fma, fmin, fmax and fmax adds bf16 builtins to libclc and optimizes half builtins to use half instructions if supported by the device.

This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it. There are parts of (something went wrong when merging these, so only parts were merged at first. The changes in this PR are the remainder): https://reviews.llvm.org/D118977 https://reviews.llvm.org/D117887 https://reviews.llvm.org/D119157

Tests for half changes are in intel/llvm-test-suite#880. Tests for bf16 implementations will be added together with adding support for these to runtime in future PRs.

…6 and bf16x2

Adds support for the following builtins: abs, neg: - .bf16, - .bf16x2 min, max - {.ftz}{.NaN}{.xorsign.abs}.f16 - {.ftz}{.NaN}{.xorsign.abs}.f16x2 - {.NaN}{.xorsign.abs}.bf16 - {.NaN}{.xorsign.abs}.bf16x2 - {.ftz}{.NaN}{.xorsign.abs}.f32 Differential Revision: https://reviews.llvm.org/D117887

This patch adds builtins/intrinsics for the following variants of FMA: NOTE: follow-up commit with the missing clang-side changes. - f16, f16x2 - rn - rn_ftz - rn_sat - rn_ftz_sat - rn_relu - rn_ftz_relu - bf16, bf16x2 - rn - rn_relu ptxas (Cuda compilation tools, release 11.0, V11.0.194) is happy with the generated assembly. Differential Revision: https://reviews.llvm.org/D118977

NOTE: this is a follow-up commit with the missing clang-side changes. This patch adds builtins and intrinsics for the f16 and f16x2 variants of the ex2 instruction. These two variants were added in PTX7.0, and are supported by sm_75 and above. Note that this isn't wired with the exp2 llvm intrinsic because the ex2 instruction is only available in its approx variant. Running ptxas on the assembly generated by the test f16-ex2.ll works as expected. Differential Revision: https://reviews.llvm.org/D119157

libclc/ptx-nvidiacl/libspirv/math/fabs.cl

libclc/ptx-nvidiacl/libspirv/math/fmax.cl

libclc/ptx-nvidiacl/libspirv/math/fmin.cl

libclc/ptx-nvidiacl/libspirv/math/native_exp2.cl

Apply review suggestions. Co-authored-by: Alexey Bader <alexey.bader@intel.com>

…functions

bader

libclc changes look good to me.

clang/test/CodeGen/builtins-nvptx-native-half-type.c

clang/test/CodeGen/builtins-nvptx.c

t4c1 · 2022-03-09T13:27:02Z

I just removed the changes to native_exp2, as that is being implemented in a slightly different way in #5747.

smanna12

FE changes LGTM As per comment: #5724 (comment)

This change (everything in clang folder) are already merged upstream. I just added them to this PR as they are required to build it. They will be part of the next pulldown.

This PR also contains some changes (everything in clang folder) that have been merged in upstream llvm since last pulldown and are required for building it.

@t4c1, could you please add upstream link?

t4c1 · 2022-03-14T07:41:14Z

Done - updated the PR description.

t4c1 and others added 17 commits February 18, 2022 08:29

added fma,fmax,fmin for half, bf16 and bf16x2 and approx exp2 for bf1…

1a53588

…6 and bf16x2

Added missing includes and forward declarations

32789b4

removed exp2 bf16 implementations that do not have builtins

f20a4bd

added optimized half2 overloads for fma, fmin and fmax

a780a06

added fma_relu for half, bf16 and bf16x2

21d98c1

[LIBCL][NVPTX] Add support for half tys native_exp2

59758bd

removed redundant undefs

b362369

changed prefix for fma_relu to __clc

86a6f42

changed bf16 builtins to use __clc prefix

0802831

added bf16 fabs builtins

be46eb4

Merge branch 'sycl' into libclc_bf16

c63c5ba

bugfixes

5ea7a87

Merge branch 'libclc_bf16' into libclc_bf16_tmp

ab922cc

Merge branch 'jakub/native_exp2' into libclc_bf16_tmp

9bbf569

t4c1 requested review from a team and bader as code owners March 3, 2022 15:17

bader changed the title ~~[SYCL][CUDA][libclc] add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax~~ [SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax Mar 3, 2022

This was referenced Mar 7, 2022

[SYCL][CUDA] Add bf16 builtins operating on storage types #5748

Merged

[SYCL] Add tests for some half builtins intel/llvm-test-suite#880

Merged

hdelan mentioned this pull request Mar 7, 2022

[SYCL] Add fma_relu extension #5749

Closed

bader reviewed Mar 9, 2022

View reviewed changes

t4c1 and others added 2 commits March 9, 2022 11:56

Apply suggestions from code review

1b49010

Apply review suggestions. Co-authored-by: Alexey Bader <alexey.bader@intel.com>

change fmax and fmin fallback implementation back to libdevice float …

50945d0

…functions

bader previously approved these changes Mar 9, 2022

View reviewed changes

fix libdevice builtin names

4658aac

t4c1 dismissed bader’s stale review via 4658aac March 9, 2022 11:39

bader previously approved these changes Mar 9, 2022

View reviewed changes

smanna12 reviewed Mar 9, 2022

View reviewed changes

clang/test/CodeGen/builtins-nvptx-native-half-type.c Show resolved Hide resolved

smanna12 reviewed Mar 9, 2022

View reviewed changes

clang/test/CodeGen/builtins-nvptx.c Show resolved Hide resolved

removed native_exp2.cl

cab6150

t4c1 dismissed bader’s stale review via cab6150 March 9, 2022 13:25

smanna12 approved these changes Mar 9, 2022

View reviewed changes

bader merged commit 62651dd into intel:sycl Mar 14, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax #5724

[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax #5724

t4c1 commented Mar 3, 2022 •

edited

Loading

bader left a comment

t4c1 commented Mar 9, 2022

smanna12 left a comment •

edited

Loading

t4c1 commented Mar 14, 2022

[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax #5724

[SYCL][CUDA][libclc] Add bf16 builtins and optimize half builtins for fma, fmin, fmax and fmax #5724

Conversation

t4c1 commented Mar 3, 2022 • edited Loading

bader left a comment

Choose a reason for hiding this comment

t4c1 commented Mar 9, 2022

smanna12 left a comment • edited Loading

Choose a reason for hiding this comment

t4c1 commented Mar 14, 2022

t4c1 commented Mar 3, 2022 •

edited

Loading

smanna12 left a comment •

edited

Loading